feat(daemon): process supervision — llama-server lifecycle + status#69
Merged
Conversation
…wn-gate TDD red-pass for Task 2 (Track 1 item 2). Tests reference symbols that don't yet exist; CI compile-fail IS the red. The green commit follows. supervisor.rs: - SupervisorStatus default = NotSpawning - SupervisorPolicy default uses documented constants - compute_backoff doubles then clamps at max (u8::MAX-safe) - INTEGRATION: supervise() with immediate-exit Command + unresponsive health URL surfaces FailedAfterMaxRestarts after exactly max_restarts - INTEGRATION: supervise() returns cleanly with Stopped on shutdown signal settings.rs: - Settings::default().llama_server_binary == None - TOML round-trip for the new field - Legacy settings.toml without the field still parses (None) bootstrap.rs (the spawn-gate decision matrix): - LlamaCpp + binary set + model file present → SPAWN - backend = Ollama → DO NOT SPAWN (per directive: Ollama supervises itself) - backend = Auto → DO NOT SPAWN (preserve pre-v0.1.2 behavior; opt-in only) - llama_server_binary = None → DO NOT SPAWN (directive's preserve-current clause) - model file missing on disk → DO NOT SPAWN (conservative; chat fails with the existing 'backend not reachable' message rather than burn the budget) - model_id unset → DO NOT SPAWN Design choices made (operator declined the multi-choice; recommended paths taken): - Status surface = new Tauri command + tracing::error! on Failed (not a tray-icon hint; UI panel is Track 1 item 3) - ctx_len = hardcoded 4096 const (no new setting field this PR) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uri status Minimal impl to satisfy the RED tests. Settings opt-in, Ollama no-spawn, clean shutdown via tokio watch channel. supervisor.rs: - SupervisorStatus enum (NotSpawning/Starting/Running/Restarting/ FailedAfterMaxRestarts/Stopped) — Serialize for Tauri command return - SupervisorPolicy: max_restarts=3, initial_backoff=500ms, max_backoff=10s, stability_window=60s; per-test override OK - compute_backoff(attempt, initial, max): exponential, u8::MAX-safe - supervise(): start → wait_for_http_ready → monitor_until_dead loop with shutdown-aware tokio::select on every wait. tracing::error on FailedAfterMaxRestarts. Stability-window reset for "stable, then crashed" process. - Module-level doc explicitly states Ollama is NOT supervised here and why (own service installer, would fight ollama-svc's restart logic). settings.rs: - llama_server_binary: Option<String>, #[serde(default)] — forward-compat bootstrap.rs: - should_spawn_llama_supervisor(settings, models_dir): the decision matrix. All five "no" branches covered by RED tests. - build_llama_supervisor(settings, models_dir) -> Option<(Arc<Sup>, Policy)>: resolves binary + model path + health URL + port; returns None when spawn-gate refuses (so main.rs branches cleanly). - parse_port helper (tiny — avoids pulling in `url` crate). - LlamaSupervisorState: holds Arc<Mutex<SupervisorStatus>> + the shutdown watch::Sender (in Mutex<Option<_>> so an exit hook can .take() it later). commands.rs: - get_supervisor_status Tauri command — returns the live status enum (Serialize-flat, snake_case tag). main.rs: - Stop allowing-dead-code on supervisor (it's live now). - Setup branch: build supervisor → spawn supervise loop on tauri runtime → manage state. NotSpawning fallback when spawn-gate refuses. - Register get_supervisor_status in invoke_handler. - Clean shutdown: dropping LlamaSupervisorState drops the watch::Sender, which signals the supervise loop, which calls Supervisor::stop() (kills child) and sets Stopped. Supervisor::Drop is the belt; watch is suspenders. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Format drifts caught by CI (no local rustfmt on this machine): - main.rs: assignment line-break before match - supervisor.rs: info!() multi-line; compute_backoff multi-arg break; assert_eq! split Frontend: - settings.ts: llama_server_binary field on Settings interface - settings/+page.svelte: input row, placeholder communicates None=do-not-spawn - npm run check clean (258 files, 0 errors) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three clippy gripes on the supervisor: 1. supervise() loop: `last_error` init value provably unread on all real paths but is sound as a sentinel — annotate #[allow(unused_assignments)] rather than restructure with Option (cheaper for a guaranteed-overwritten variable). 2. spawn_vllm_server: kept for the future NVIDIA-supervision path (already tested), not wired into bootstrap this PR. #[allow(dead_code)] with a doc comment explaining the deferred path. 3. LlamaSupervisorState.shutdown: observed via Drop semantics (dropping the watch::Sender wakes the supervise loop), not direct reads. Annotated and accompanied by a stub signal_shutdown method for the future ExitRequested hook. dead_code allow scoped to the field + method. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OpenCircuitDev
pushed a commit
that referenced
this pull request
Jun 11, 2026
#56 verdict Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Track 1, item 2 from
docs/AGENT_OPERATIONS.md— activate the dead-code supervisor module. The daemon can now spawn + supervise its ownllama-serverinstead of requiring the user to hand-run it, with health-gated restart, exponential backoff, max-restart budget, and a Tauri status command for the UI to surface failure.Ollama is intentionally not supervised here (it has its own service installer + lifecycle); the spawn-gate refuses when
backend = "ollama", and the module doc explains why.What changed
Rust (
crates/ocm-daemon/)settings— one new field:llama_server_binary: Option<String>.#[serde(default)]for forward-compat. Doc explicitly notes "no-op when backend = ollama".supervisor— was already a partial module (Supervisor struct, spawn helpers,wait_for_http_ready); now activated. Added:SupervisorStatusenum (Serialize, snake_case tag):NotSpawning/Starting/Running { pid }/Restarting { attempt, last_error }/FailedAfterMaxRestarts { attempts, last_error }/Stopped.SupervisorPolicywith defaults:max_restarts=3,initial_backoff=500ms,max_backoff=10s,stability_window=60s,health_check_interval=5s,health_check_timeout=15s.compute_backoff(attempt, initial, max)— exponential,u8::MAX-safe (no shift overflow).supervise(supervisor, policy, status, shutdown)— health-gated restart loop. Every wait istokio::select!-raced against the shutdown signal. Stability-window reset means a process that ran healthy then crashed isn't penalized as a flap.tracing::errorwhen the budget is exhausted.spawn_vllm_serverkept (already tested) but explicitly#[allow(dead_code)]— NVIDIA supervision is a separate follow-up.bootstrap— wire-points:should_spawn_llama_supervisor(settings, models_dir) -> bool— the spawn-gate decision: requiresbackend = LlamaCppANDllama_server_binary.is_some()AND the model GGUF exists atmodels_dir/<model_id>.gguf(matchingocm_models::downloaderconvention). Six test cases cover the matrix.build_llama_supervisor(settings, models_dir) -> Option<(Arc<Supervisor>, SupervisorPolicy)>— resolves binary + model path + port (frominference_base_url) + health URL (/v1/models), returns None when the gate refuses.LlamaSupervisorState— whatmain.rsapp.manage()s. Holds the shared statusArc<Mutex<_>>plus the shutdownwatch::Sender(inMutex<Option<_>>so a futureRunEvent::ExitRequestedhook can.take()it). Dropping the state is sufficient for clean shutdown in v0.1.2;signal_shutdownstub is included for the future hook.commands—get_supervisor_statusTauri command, returns the liveSupervisorStatus.main— supervisor wired intosetup(): build → spawn supervise loop on tauri runtime →app.manage(supervisor_state). Falls cleanly back toNotSpawningwhen the gate refuses. Removed the#[allow(dead_code)] mod supervisor;annotation (it's live now).Frontend —
Settingsinterface gainsllama_server_binary; settings page gains a path input with placeholder communicating None=do-not-spawn.TDD audit trail
115286etest: RED35849fcfeat: GREEN84e746bfix: rustfmt + frontendunused_assignments,dead_codeon intentional-future-use code44253c4fix(clippy)New test coverage (+17 tests)
supervisor.rs(+5):SupervisorStatus::default();SupervisorPolicy::default()lockstep with constants;compute_backoffschedule (incl.u8::MAXsafety);supervise()integration Phase 1: Foundation — Tauri shell, paths, settings, CI #1: immediate-exit Command + unresponsive health URL → exactlymax_restarts=2attempts →FailedAfterMaxRestarts;supervise()integration Phase 0: Bench framework scaffold + first isolation sandbox #2: long-running sleep + early shutdown signal →Stopped+ child reaped.settings.rs(+3): defaultllama_server_binary == None; TOML round-trip; legacy file (no field) still parses.bootstrap.rs(+9): full spawn-gate decision matrix (6 cases — yes / Ollama-no / Auto-no / binary-None-no / file-missing-no / model_id-None-no),parse_porthelper (2 cases),build_llama_supervisorreturns None when gate refuses.ocm-daemon test count: 25 → 42 (verified in run 27381788960 log).
Design choices made
Per AGENT_OPERATIONS "NEEDS_APPROVAL when not covered by spec": the operator declined the multi-choice up front, so I took my own recommended paths and documented them:
get_supervisor_status+tracing::erroronFailedAfterMaxRestarts. No tray-icon hint, no UI panel in this PR (UI polish = Track 1 item 3).ctx_len: hardcodedDEFAULT_LLAMA_CTX_LEN = 4096const (matches implementation-plan example). No new Settings field; revisit in item 3 if needed.model_idis set but the GGUF doesn't exist on disk, refuse to spawn rather than burn the restart budget on a server with nothing to load. Chat fails loudly via the existing "backend not reachable" message instead.backend = "llamacpp". Preserves pre-v0.1.2 behavior for users who never opted into supervision.Test plan
cargo clippy --workspace --all-targets -- -D warnings— green on ubuntu/macOS/windows (run 27381788960)cargo test --workspace— 42 ocm-daemon tests + others green on all 3 platforms (run 27381788960)npm run check— 258 files, 0 errors (local)llama_server_binaryat a real binary, download a registry model, launchcargo tauri dev, observe llama-server spawned + supervisord. Not run here (no local Rust toolchain).FailedAfterMaxRestartsafter 3 attempts viaget_supervisor_status(frontend hook is follow-up).Out of scope (deferred)
spawn_vllm_serverhelper still exists, still tested, but explicitly not wired (heavier preconditions; separate follow-up).RunEvent::ExitRequestedhook — current clean-shutdown is Drop-of-watch::Sender (sufficient; the loop notices, callsSupervisor::stop(), setsStopped). An explicit hook + join-handle wait would be more graceful;signal_shutdownstub is ready for it.🤖 Generated with Claude Code